neural network memorize
What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
Deep learning algorithms are well-known to have a propensity for fitting the training data very well and often fit even outliers and mislabeled data points. Such fitting requires memorization of training data labels, a phenomenon that has attracted significant research interest but has not been given a compelling explanation so far. A recent work of Feldman (2019) proposes a theoretical explanation for this phenomenon based on a combination of two insights. First, natural image and data distributions are (informally) known to be long-tailed, that is have a significant fraction of rare and atypical examples. Second, in a simple theoretical model such memorization is necessary for achieving close-to-optimal generalization error when the data distribution is long-tailed.
Supplemental Material for What Neural Networks Memorize and Why A Proof of Lemma 2.1
We now compute the expected squared error of each of the terms of this estimator. In both cases the squared error is at most 1 / 4 . We implement our algorithms with Tensorflow [1]. Our implementation achieves 73% top-1 accuracy when trained on the full training set. For DenseNet, we halved the batch size and learning rate due to higher memory load of the architecture.
Review for NeurIPS paper: What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
Weaknesses: I would like to see some clarification on the long tail theory. If the value of mem(A,S,i_1,...,i_k) is high, perhaps we can still call this phenomenon "memorization." If so, then memorization phenomenon is not just limited to long tails. Then, it seems to me the claim in [12] that memorization is needed due to long tail may not be showing a bigger picture. The paper mentions that very high influence scores are due to near duplicates in the training and test examples.
What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
Deep learning algorithms are well-known to have a propensity for fitting the training data very well and often fit even outliers and mislabeled data points. Such fitting requires memorization of training data labels, a phenomenon that has attracted significant research interest but has not been given a compelling explanation so far. A recent work of Feldman (2019) proposes a theoretical explanation for this phenomenon based on a combination of two insights. First, natural image and data distributions are (informally) known to be long-tailed, that is have a significant fraction of rare and atypical examples. Second, in a simple theoretical model such memorization is necessary for achieving close-to-optimal generalization error when the data distribution is long-tailed.